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(57) Abstract 

An audio decoder solution is here provided where a reduction in computing power is required. The proposed method consists of 
forcing the multiple output channels to only one type of inverse transformation format. A format of long transform length is more suitable 
for input signals whose spectrum remains stationary or quasi-stationary. This provides a greater frequency resolution, improved coding 
performance and a reduction of computing power required. Another format of two or more short transform lengths, possessing greater 
time resolution, is more desirable for rapidly changing signals with time. The computer power required for two or more short transforms 
should be higher than for only one transformation. ITie time versus frequency resolution trade-off should be considered when selecting 
a transform block length. Advantage is taken of human hearing behaviour to reduce the computing power of a processing engine (e.g. 
DSP) when downmixing from an M-channel input to a P-channel output is required. ITie encoder provides spectral information concerning 
the transmitted audio signal frame. This information corresponds to signals which are stationary/quasi-stationary or changing rapidly with 
time. Some analysis is required to decide which input channels are forced to long or short block conversion prior to frequency-domain 
downmixing and transformation. 
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Method and Apparatus for Frequency-Domain Downmixing with Block-Switch 
Forcing for Audio Decoding Functions 

This invention relates generally to audio decoders. More particularly, the present 
5 invention relates to multi-channel audio compression decoders with downmixing capabilities. 

An audio decoder generally comprises two basic parts: a demultiplexing portion, the 
main function of which consists of unpacking a serial bit stream of encoded data, which in 
. this case is in the frequency-domain; and time-domain signal processing, which converts the 

10 demultiplexed signal back to the time-domain. A multi-channel output section may be 
provided to cater for a multiple output format. If the number of channels required at the 
decoder output is smaller than the number of channels which are encoded in the bit stream, 
then downmixing is required. Downmixing in the time-domain is usually provided in present 
decoders. However, since the inverse frequency-domain transform is a linear operation, it 

1 5 is also possible to downmix in the frequency-domain prior to transformation. 

The encoded data representing the audio signals may convey from one to multiple full 
bandwidth channels, along with a low frequency channel. The encoded data is organised into 
synchronisation frames. The way in which the demultiplexing and time-domain signal 

20 processing portions are related is a function of the information available in a synchronisation 
frame. Each frame contains several coded audio blocks, each of which represents a series of 
audio samples. Further, each frame contains a synchronisation information header to 
facilitate synchronisation of the decoder, bit stream information for informing the decoder 
about the transmission mode and options, and an auxiliary data field which may include user 

25 data or dummy data. For example for an AC-3 audio decoder from Dolby Laboratories of 
San Francisco, California, the data field is adjusted by the encoder such that the cyclic 
redundancy check element falls on the last word of the frame The cyclic redundancy check 
word is checked after more than half of the frame has been received. Another cyclic 
redundancy check word is checked after the complete frame has been received, such as 

30 described in Advance Television Systems Committee, Digital Audio Compression Standard 
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(AC-3), 20 December 1995. Another example is the MPEG-1 standard audio decoder where 
the cyclic redundancy check-word is optional for normal operation. However, if the MPEG-2 
extension is required, then there is a compulsory cyclic redundancy check-word. 

5 An audio block also contains information relating to splitting of the block into two or 

more sub-blocks during the transformation from the time-domain to the frequency-domain. 
A long block length allows the use of a long transform length, which is more suitable for 
input signals whose spectrum remains stationary or quasi-stationary. This provides a greater 
frequency resolution, improved coding performance and a reduction of computing power 
10 required. Two or more short length transforms, utilised for short block lengths, enable 
greater time resolution, and is more desirable for signals whose spectrum changes rapidly 
with time. The computer power required for two or more short transforms is ordinarily 
higher than if only one transformation is required. This approach is very similar to behaviour 
known to occur in human hearing. 

15 

Again as an example, in the Dolby AC-3 audio decoder mentioned above, dither, 
dynamic range, coupling function, channel exponents, bit allocation function, gain, channel 
mantissas and other parameters are also contained in each block. However, they are 
represented in a compressed format, and therefore unpacking, setting-up tables, decoding, 
20 expansion, calculations and computations must be performed before the pulse coded 
modulation (PCM) audio samples can be recognised. 

The input bit stream for a decoder will typically come from a transmission ( such as 
HDTV, CTV) or a storage system (e.g. CD, DAT, DVD). Such data can be transmitted in 

25 a continuous way or in a burst fashion. The demultiplexing and bit decoding portion of the 
decoder synchronises the frame and stores up to more than half of the data before the start of 
processing. The synchronisation word and bit stream information are unpacked only once per 
frame. The audio blocks are unpacked one by one, and at this stage each block containing 
the new audio samples may not have the same length (i.e. the number of bits in each block 

30 may differ). However, once the audio blocks are decoded, each audio block will have the 
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same length. The first audio block contains not only new PCM audio samples but also extra 
information which concerns the complete frame. The rest of the audio blocks may contain 
a smaller number of bits. The bit decoding section performs an unpacking and decoding 
function, the final product of which will be the frequency transform coefficients of each 
5 channel involved, in a floating-point format (exponents and mantissas) or fixed-point format. 

The time-domain signal processing (TDSP) section first receives the transform 
coefficients one block at a time. In normal operation, when the signals spectra are relatively 
stationary in nature and have been frequency-domain transformed using a long transform 
10 length, a block-switch flag is disabled. The TDSP uses a 2N-point inverse fast Fourier 
transform (IFFT) of corresponding long length to obtain N time-domain samples. When fast 
changing signals are considered, the block-switch flag is enabled and signals are frequency- 
domain transformed differently, though the same number of coefficients, N, are also 
transmitted. Then, a short length inverse transform is used by the TDSP. 

15 

Where the audio decoder receives M channel inputs (M an integer), and produces P 
output channels, where M>P and P<0, the audio decoder must provide M frequency- 
domain transformations. Since only P output channels are required, a dowhmixing process 
is then performed. The number of channel is downmixed from M to P. 

20 

It is an object of the invention to provide an audio decoder which mixes M channels 
down to P channels in the frequency-domain rather than in the time-domain; M > P and 
P<0. This can be referred as the block-switch forcing method. Accordingly, the maximum 
number of M frequency-domain to time-domain transformations is not required. Instead, 
25 according to the type of signal transformed into the frequency-domain, the number of these 
transformations can be reduced from M to P. 

In accordance with the present invention, there is provided a method of audio data decoding, 
comprising: receiving a data signal and demultiplexing the data signal into a plurality of M 
30 frequency-domain input data channels; downmixing said M frequency-domain input channels 



WO 98/51126 



PCT/SG97/00020 



-4 - 

into P frequency-domain channels, where M>P and P>0, M and P both integers; and 
selecting an inverse transformation length and performing an inverse transformation of the 
P frequency-domain channels according to the selected length, so as to produce P audio 
sample output channels. 

5 

The present invention also provides an audio decoder, comprising: a demultiplexer for 
receiving a data signal and demultiplexing the data signal into a plurality of M frequency- 
domain input data channels; means for downmixing said M frequency-domain input channels 
into P frequency-domain channels, where M>P and P>0, M and P both integers; and means 
10 for selecting an inverse transformation length and performing an inverse transformation of 
the P frequency-domain channels according to the selected length, so as to produce P audio 
sample output channels. 

Preferably, the transform length of each of the M frequency-domain input channels is 
15 determined. The transform lengths of the input channels may comprise a long or a short 
transform length, and the relative numbers of long and short transform lengths amongst the 
M input channels may be utilised to select the inverse transform length for performing the 
inverse transformation of the P downmixed frequency-domain channels. 

In embodiments of the invention, a specific data channel contains a number of 
transform coefficients and information indicating the type of transformation effected in the 
encoding process, such as a transformation involving one long block (referred to as 
"longblock" or "LB" hereafter), or two or more short blocks (referred to as "shortblock" or 
"SB" hereafter) being transformed one after the other. There are several combinations of 
frequency-domain downmixing using the herein described block-switch forcing method: 

(1) If the number of input channels is an even number (M even) and the number 
of channels comprising longblocks is LBsM/2, then the channels with LB will be 
converted to shortblock, SB, channels. 



20 



25 
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(2) If the number of input channels is an even number (M even) and the number 
of channels comprising longblocks is LB>M/2, then the channels with LB will 
remain intact. 

5 (3) If the number of input channels is an even number (M even) and the number 

of channels with shortblocks is SB<M/2, then the channels with SB will be converted 
to longblock, LB, channels. 

(4) If the number of input channels is an even number (M even) and the number 
10 of channels with shortblocks is SB^M/2, then the channels with SB will remain intact. 

(5) If the number of input channels is an odd number (M odd) and the number of 
channels comprising longblocks is LBs INT(M/2), then the channels with LB will be 
converted to shortblock, SB, channels. 

15 

(6) If the number of input channels is an odd number (M odd) and the number of 
channels comprising longblocks is LB> INT(M/2), then the channels with LB will 
remain intact. 

20 (7) If the number of input channels is an odd number (M odd) and the number of 

channels with shortblocks is SB< INT(M/2), then the channels with SB will be 
converted to longblock, LB, channels. 

(8) If the number of input channels is an odd number (M odd) and the number of 
25 channels with shortblocks is SB* INT(M/2), then the channels with SB will remain 

intact. 

When one of the previous combinations applies, the block-switch forcing method and 
the downmixing in the frequency domain (i.e. M down to P channels) can be performed. 
30 This applies for all the channels having the same format, either longblock, LB, or shortblock, 
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SB, formats. This approach can save (M-P) frequency -domain to time-domain 
transformations, and thus significant processing resources can be saved. 

Considering that: 

5 (a) a long transform length is more suitable for input signals whose spectrum 

remains stationary or quasi-stationary (this provides a greater frequency resolution, improved 
coding performance and a reduction of computing power required); 
and that: 

(b) two or more short length transforms, possessing greater time resolution, is 
10 more desirable for signals having spectra rapidly changing with time (the computer power 
required for two or more short transforms is generally higher than for only one 
transformation); 

the preferred form of channel conversion is from two or more shortblocks, SBs, to only one 
Iongblock, LB, due to the lower computing power required. However, the option of 
15 converting from one Iongblock, LB, to two or more shortblocks, SBs, is also within the scope 
of this invention. 

It will be appreciated that the way of selection of block conversion will in practice 
depend on the actual characteristics of the audio samples being analysed. In other words, if 

20 in the M-input channels, the numbers of Iongblock, LB, format channels is higher than the 
number of shortblock, SB, format channels, this suggests that the particular frame of audio 
samples are stationary or quasi-stationary in nature and that the shortblocks should be 
converted to a Iongblock. On the other hand, if in the M-input channels, the number of 
, Iongblock, LB, format channels is smaller than the number of shortblock, SB, format 

25 channels, then this also suggests that the particular frame of audio samples contains a higher 
time domain resolution and that a Iongblock should be converted to shortblocks. Any given 
audio program may have any type of signal content; from purely stationary waveforms to 
completely random behaviour. However, some further simplifications can be obtained if the 
general nature of the audio program is known apriori, which would allow the audio decoder 
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to determine in advance the most suitable form of block conversions, without having to make 
that determination from an examination of the received data itself. 

Example of the Methodology of the Invention 

5 

a) For converting N frequency-domain audio samples from a longblock, LB, format to 
two or more shortblock, SB, format, the longblock can be split as follows: 

SB-1: X 0 [Sk]; k=0,l,...,N-l 
SB-2: X,[Sk + l]; k=0,l,...,N-l 

10 

SB-SiX^Sk+CS-l)]; k=0,l,...,N-l 

The frequency-domain downmixing is then performed and the frequency-domain to time- 
domain conversion using shortblocks is applied. Note, S is the number of shortblocks the 
15 longblock is divided into. 

The downmixed output can be represented as: 
y 0 [k] =downmixedfrom{X 0 [k] ,X,[k] t . . . ,X s [k] } 
y y [k] =downmixed from{X 0 [k] ,X, [k] , . . . ,X s [k] } 

20 

Y P [k] =downmixed from{X 0 [k] ,X x [k] , . . . ,X s [k] } 

A frequency-domain transformation is used in order to recover the time-domain 
samples. It is desirable that the number of shortblocks be a non-prime number with the 
25 purpose of using power-of-two based Fourier transformations. However, the general 
principles are applicable even for an odd or prime number of shortblocks. In these cases 
normal Fourier transformation may be used. 

b) For converting N frequency-domain audio samples from two or more shortblock, SB, 
30 format to a longblock, LB, format, the shortblocks are no longer de-interleaved, the 
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frequency-domain downmixing takes place and the same principle of frequency-domain to 
time-domain conversion using longblock is applied. 

Thus, as mentioned, before the frequency-domain to time-domain conversion is 
5 applied, the frequency-domain downmixing operation from M-input channels to P-output 
channels is employed, which reduces the computing power required for the audio decoder 
function as well as the memory used for the conversion. 

The invention is described in greater detail hereinbelow, by way of example only, with 
1 0 reference to the accompanying drawings, wherein: 

Figure 1 is a general block diagram of an encoder and decoder system for audio 
compression in a multi -channel configuration; 

Figure 2 is a block diagram of the decoder function of the audio system which 
includes bit parsing and time-domain aliasing cancellation sections; 
15 Figure 3 is a general block diagram of a prior art audio decoder configured for 

downmixing; 

Figure 4 is a more detailed block diagram of the audio decoder of Figure 3, showing 
interconnected transformation, downmixing, overlap-and-add technique and windowing 
blocks; 

20 Figure 5 shows a practical implementation of the overlap-and-add technique involving 

windowing; 

Figure 6 shows the implementation of Figure 5 in a block diagram form; 

Figure 7 is a general block diagram of an audio decoder according to an embodiment 
of the invention, showing interconnected block-switch selection and downmixing, 
25 transformation, overlap-and-add technique and windowing blocks; 

Figure 8 shows the implementation of the frequency-domain downmixing prior to the 
time-domain conversion by the inverse transform, with the frequency-domain coefficients 
forced to be transformed by using two or more inverse transforms; 
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Figure 9 shows the implementation of the frequency-domain downmixing prior to the 
time-domain conversion by the inverse transform, with the frequency-domain coefficients 
forced to be transformed using a single inverse transform; and 

Figure 10 is a flow diagram illustrating the general procedure for audio decoding 
5 according to embodiments of the invention. 

For audio signals with stationary or quasi-stationary nature, the PCM audio signals are 
partitioned in sections of 2N time-domain audio samples. The block diagram of Figure 1 
shows an example of the methodology of frequency-domain to time-domain conversion. This 

10 involves "windowing" and overlap-and-add technique to recover the PCM audio samples. 
This technique is described, for example, in "The Fast Fourier Transform" (E.O. Brigham, 
Prentice-Hall Inc., pp 206-221), the contents of which are included herein by reference. 
Figure 2 shows the decoder function of the audio system which includes the bit parsing and 
the time-domain aliasing cancellation sections. In these configurations, the number of output 

1 5 channels from the decoder equals the number of input channels contained in the serial bit 
stream, and thus no downmixing is required. 

In many reproduction systems, the number of output channels (loudspeakers) will not match 
the number of encoded audio channels, M>P. In order to reproduce the complete audio 

20 program downmixing is required. Downmixing can be performed in the time-domain. 
However, since the inverse transform is a linear operation, downmixing can also be 
performed in the frequency-domain prior to transformation. Downmixing coefficients are 
needed in order to keep the downmixing operation at the correct output levels without driving 
the output channels out of the capabilities range, and the downmixing coefficients may vary 

25 from one audio program to another, as is readily apparent to those of ordinary skill in the art. 
The downmixing coefficients will also allow program producers to monitor and make 
necessary alteration to the programs so that acceptable results are achieved for all type of 
listeners, from professional audio equipment enthusiasts to consumer electronics and multi- 
media audience. 

30 
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Figure 3 is a block diagram showing another prior art audio decoder construction, in this case 
requiring a downmixing function in order to provide the audio output through fewer channels 
than was used to encode the audio data originally. The multi-channel input section is 
downmixed to multi-channel output where the number of output channels is smaller than the 
5 number of input channels. The block diagram of Figure 4 illustrates the interconnections of 
the transformation, downmixing, overlap-and-add technique and windowing blocks as used 
in prior an audio decoding and downmixing constructions. An example of this form of 
construction is described in United States Patent Number 5,400,433, assigned to Dolby 
Laboratories Licensing Corporation. It is to be noted that in this form of audio decoding and 
10 downmixing, because the downmixing is performed in the time-domain format of the audio 
data, each of the frequency-domain channels must be inverse transformed, requiring 
significant computational processing power. 

The overlap-and-add and windowing techniques mentioned above are described through 
15 example below. In the following example 2N=512, such that a longblock, LB, comprises 
512 time-domain samples and a shortblock, SB, comprises 256 samples. 

The frequency-domain coefficients are represented by: 
X[k],k«0,l,...,N-l 

20 

These frequency-domain coefficients are augmented with zeroes to form one period (e.g. 2N) 
of a periodic function to eliminate overlap effects. In particular, the value of N is chosen to 
be N=2 Y , y integer value, and 2N-N=Q are zero values. Note that the addition of Q zeroes 
ensures that there will be no end effect. The computation procedure for the inverse fast 
25 Fourier transform (IFFT) convolution, overlap-and-add method is detailed below. 

Form the sampled periodic function X[k] 
X[k]=X[kJ, k=0,l,...,N-l 
X[k]=0, k=N,N + l,...,2N-l 



30 
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Compute the inverse fast Fourier transform (IFFT) of X[k] 



N-l 



z[n]=E X[k]eJ 2nnk/N 



Repeat the same steps for the next period and combine the 
according to: 



sectioned results by the 



5 



z[n]=z,[n] n=0,l,. 
z[n+2N-Q+ 1] =z,[n+2N-Q+ 1] +zj;n] n=0, 1 , . 

z[n+2(2N-Q+ 1)] =Z2[n+2N-Q+ 1] +z,[n] n=0, 1, . 



.,2N-Q 



-,2N-Q 



.,2N-Q 



etc. 



For audio signals with random or dynamic nature, the PCM audio signals are partitioned in 
10 sections of 2N time-domain audio samples and two or more sections are taken per frame. 

Figure 5 shows a practical implementation of the overlap-and-add technique involving 
windowing. N frequency-domain coefficients are obtained from the encoder. N/2 of these 
coefficients correspond to the real part and N/2 to the imaginary part (i.e. there are N/2 

15 complex coefficients). A pre-twiddle operation is first performed to these coefficients before 
converting them into the time-domain by using a N/2-point IFFT. A post-twiddle operation 
is performed to these time domain samples before windowing. The real part of the time- 
domain samples is first windowed to produce: the odd frequencies of the lowers N/4 section 
(OLL); the odd frequencies of the highest N/4 section (OHH); and the even frequencies of 

20 the middle N/2 section (EHL & ELH). The imaginary part of the time-domain samples is 
then windowed to produce: the even frequencies of the highest N/4 section (EHH); the even 
frequencies of the lowest N/4 section (ELL); and the odd frequencies of the middle N/2 
section (OLH & OHL). Figure 6 shows the same implementation in a block diagram form, 

25 In the following mathematical example it is considered that the N/2 =256 transformed 
coefficients received by the TDSP block were obtained in the encoder section by using 
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2N=512 real time-domain audio samples. With this consideration, some simplifications can 
be obtained by working in the frequency-domain. 

For the practical implementation, assume that the length of the blocks is such that N =512 and 
5 128 complex-valued transform coefficients were obtained from a 128 real -valued input 
sequence. Here, 128 zeroes are considered for the imaginary part. 

Define the frequency-domain transform coefficients 
X[k]=X R [k] k=0,l,...,127 
10 X[k]=X r [k] k= 128..., 255 

Compute N/4-point complex multiplication product 
Z[k]= (X[N/2 - 2k - l]xcosl[k] - X[2k]xsinl[k]) 

- +j(X[2k]xcosl[k]+X[N/2 - 2k - l]xsinl[k]), k=0,l,...,127 

where 

xcosl[k] =-cos(27i(8k+ 1)/(8N)) 
jcsinl[k]=-sin(27t(8k+l)/(8N)) 



20 Compute N/4-point complex IFFT 

z[n] =z[n] + Z[k](cos(8rckn/N) +j(sin(8ukn/N)), n=0, 1 , . . . , 127 

Compute N/4-point complex multiplication product 
y[n]= (zr[n]xcosl[n] - zi[n]xsinl[n]) 
25 +j(zi[n]xcosl[n] + zr[n]xsinl [n]), n=0, 1 127 



30 



where 

zr[ri]=real[z[n)) 
zi[n]=imag(z[n]) 
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Compute windowed time-domain samples 



42n]=-yt[N/8+n]wt2n]; 


n= 


=0,1,. 


.,63 


42n+ l]=yr[N/8-n-lM2n + 1]; 


n= 


=0,1,. 


.,63 


4N/4 + 2n] = lyr{n] w[N/4 + 2n] ; 


n= 


=0,1,. 


.,63 


x[N/4 + 2n + 1 ] =yi[N/4-n- 1] w[N/4 + 2n + 1 ] ; 


n= 


=0,1,. 


.,63 


4N/2+2n]=-3>r[N/8-n]w[N/2-2n-l]; 


n= 


=0,1,. 


.,63 


x[N/2+2n+ 1] =y/[N/8-n-l]w[N/2-2n-2]; 


n= 


=0,1,.. 


.,63 


43N/4 +2n] =yi[n]w[N/4-2n-l] ; 


n= 


=0,1,.. 


.,63 


43N/4 +2n +1] =-3>/fN/4-n- l]w[N/4-2n-2] ; 


n= 


=0,1,.. 


.,63 



10 

The first half of the windowed block is overlapped with the second half of the previous block. 
These two halves are added sample-by-sample to produce the PCM output audio samples. 
This implementation is represented step-by-step in Figure 5, where the value of N =512, and 
the blocks shown represent data at vaious stages of the process. The process as described 
15 progresses down the page as shown in Figure 5. 

A similar practical implementation is obtained when two or more shortblocks are transmitted. 
The difference lies on the inverse transformation block size being used. The transformed 
block size is divided by the number of shortblocks considered. For this case, N/2 =256 
20 transformed coefficients received by the TDSP were also contained by using 2N=512 real- 
valued time-domain audio samples. 

The difference here consists in that 256 real-valued time-domain samples are taken in first 
place and then converted into the frequency domain by using a 128-point FFT. This provides 
25 only 128 complex transform coefficients. The second 256 real-valued time-domain samples 
follow the same procedure. At the end, the two blocks of 128 complex coefficients are 
interleaved in order to form the 256 complex transform coefficients. 
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In view of the first— - 1 frequency components being an exact mirror of the second— - ] 

N 2 
components, only— coefficients are transmitted (i.e. 128 real-valued block and 128 

imaginary-valued block, one after the other). 

5 The interconnection of the block-switch selection and downmixing, transformation, overlap- 
and-add technique and windowing sections, according to an embodiment of the present 
invention, is illustrated in Figure 7 . Figure 8 shows the implementation of the frequency- 
domain downmixing prior to the time-domain conversion by the inverse transform, in the case 
where the frequency-domain coefficients are forced to be transformed using two or more 
10 inverse transforms. The case where two or more small blocks of the frequency-domain 
coefficients are forced to be transformed using a single inverse transform is illustrated in 
Figure 9. 

Referring to Figures 8 and 9, which illustrate processing procedures of the preferred 
15 embodiment, an N real-valued or complex-valued audio samples are taken and used back-to- 
back with N real-valued or complex-valued audio samples of the previous block to form a 2N 
samples block (Figure 8). Based on transients detection used to determine when to switch 
from a long transform block to the short transform block, each audio block is transformed 
into the frequency-domain by performing one long 2N-point transform, or two or more short 
20 2N/S~point transforms. Note, S is the number of sections the long block is divided into. At 
the end of this step, N real-valued or complex-valued transform coefficients should be 
transmitted. 

For real-valued audio samples, the same procedure applies but the number of transform 

25 coefficients transmitted is reduced by half. This is due to the fact that the frequency-domain 

coefficients are mirrored from the DC component to*^ and ffom^ to ^ In this case onlv 

4 4 2 ' 

N/2 complex-valued coefficients are transmitted. 

At the decoder side, two scenarios are encountered: the scenario where N/2 complex-valued 
30 coefficients of a channel which were obtained by performing one long 2N-point transform at 
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the encoder section. There is need to downmix these coefficients to other N/2 complex- 
valued coefficients of other channels which were obtained by performing two or more 2N/S- 
point transforms at the encoder section. The solution is to de-interleave the coefficients of 
the former channel and separate the number of sections, S, required. The frequency-domain 
5 downmixing is applied and the number of output channels obtained. Each of these channels 
coefficients will be padded with (N/S) zeroes and the Fourier transform applied to each of 
them. A "window" function is used to reduce the effects of block Fourier transformation and 
the overlap-and-add method applied to recover the original audio samples. 

10 The second scenario where the N/2 complex-valued coefficients of a channel were obtained 
by performing two or more 2N/S-point transforms at the encoder section. There is need to 
downmix these coefficients to other N/2 complex-valued coefficients of other channels which 
were obtained by performing one long 2N-point transform at the encoder section. The 
solution here is to de-interleave the coefficients of the former channel and add (S-l) zeroes 

15 between the de-interleaved coefficients. The frequency-domain downmixing is applied and 
the number of output channels obtained. At each of these channels coefficients the Fourier 
transform will be applied. A "window" function is used to reduce the effects of block Fourier 
transformation and the overlap-and-add method applied to recover the original audio samples. 

20 The general procedure of audio decoding according to embodiments of the invention is 
illustrated in block diagram form in Figure 10. The procedure begins with the reception by 
the audio decoder of a frame of encoded audio data. As mentioned this encoded audio data 
frame may typically originate from a either a transmission or storage system, and comprise 
part of a serial bit stream. The encoded audio data frame comprises a plurality of blocks of 

25 data corresponding to separate channels in the audio program, and the blocks are multiplexed 
together in the frame in a known way. Thus, after receiving the frame the audio decoder 
proceeds to de-multiplex the frame into the plural (M, M an integer > 1) data blocks 
corresponding to audio data channels. The audio data in each data block is encoded in the 
frequency domain, and the method in which is was transformed from the time-domain audio 

30 samples to the frequency-domain audio data may vary depending in particular upon the time 
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vaiying nature of the original audio signal frequency spectrum. For audio signals in which 
the frequency spectrum remains stationary or quasi-stationary, the PCM samples therefrom 
may typically be transformed in long blocks using a relatively long fast Fourier transform 
length, for example. This is advantageous in that longer transform lengths require less 
5 computing power resources than is needed for use of a shorter transform. However, if the 
audio frequency spectrum of the signal changes relatively rapidly with time, the performance 
of the audio system can be significantly enhanced if the audio signals are encoded using 
shorter audio data sample blocks and corresponding shorter transform lengths. 

10 Once the audio data frame has been de-multiplexed into its constituent data channel 
components, each channel (data block) is examined by the decoder to determine the method 
by which the audio data in the block was transformed from the time-domain to the frequency 
domain. This might typically be accomplished by examining a sub-block-size flag or the like 
transmitted as part of the data block or in the frame as a whole. Of the M plural channels 

15 comprising the audio data frame, the number of channels encoded using a short transform 
length and the number encoded using a long transform length are tallied by the decoder. 

As discussed hereinabove, a saving of computing resources can be achieved if long length 
transformations are employed, and that applies equally well to the inverse transformations 

20 which take place at the decoder. Thus, if it is possible to decode an audio channel using a 
long inverse transformation, then this is preferable from the computing resources viewpoint, 
even if in some instances the corresponding data block was initially encoded in several short 
sub-blocks using a short transform length. The use of a particular inverse transform length 
to decode data encoded using a different length transform is referred to herein as block-switch 

25 forcing. To minimise computing resources in the decoder it is obviously preferred that the 
inverse transform be force switched to longer blocks more often, however the forced use of 
a shorter length (and thus computationally more expensive) inverse transform where a long 
length transform was used for encoding is also within the ambit of the invention. 
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Care must be taken that the audio quality it not degraded significantly by block-switch forcing 
to a long inverse transform length where a short transform would ordinarily be appropriate. 
Accordingly, the following guidelines are utilised for the selection of the various forms of 
forced block-length switching, based on the relative numbers of channels in the audio data 
5 frame which were encoded using short and long length blocks. 

(1) If the number of total channels is an even number (M even) and the number of 
channels comprising longblocks is LB<;M/2, then the channels with LB will be converted to 
shortblock, SB, channels. 

10 

(2) If the number of total channels is an even number (M even) and the number of 
channels comprising longblocks is LB>M/2, then the channels with LB will remain intact. 

(3) If the number of total channels is an even number (M even) and the number of 
15 channels with shortblocks is SB<M/2, then the channels with SB will be converted to 

longblock, LB, channels. 

(4) If the number of total channels is an even number (M even) and the number of 
channels with shortblocks is SB^M/2, then the channels with SB will remain intact. 

20 

(5) If the number of total channels is an odd number (M odd) and the number of channels 
comprising longblocks is LB* INT(M/2), then the channels with LB will be converted to 
shortblock, SB, channels. 

25 (6) If the number of total channels is an odd number (M odd) and the number of channels 
comprising longblocks is LB> INT(M/2), then the channels with LB will remain intact. 

(7) If the number of total channels is an odd number (M odd) and the number of channels 
with shortblocks is SB< INT(M/2), then the channels with SB will be converted to 
30 longblock, LB, channels. 
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(8) If the number of total channels is an odd number (M odd) and the number of channels 
with shortblocks is SB* INT(M/2), then the channels with SB will remain intact. 

The downmixing of the audio data channels from M channels to P channels (M > P) is 
5 performed using a frequency domain downmixing table, as discussed hereinabove, as is 
known amongst those in the relevant art. As mentioned the values of the coefficients in the 
downmixing table may vary from one application to another, for example depending upon the 
nature of the audio program to be decoded and downmixed. 

10 Following the downmixing, the P downmixed audio channels are then inverse transformed 
from the frequency-domain to the time-domain so as to obtain PCM coded audio samples 
which can be utilised to reproduce the audio program. The form of the inverse 
transformation employed (e.g. short or long) is determined according to the preceding block- 
switch forcing mode selection. Of course following the inverse transformation the audio data 

15 samples may be subjected to overlap-and-add and windowing procedures as known in the art 
and discussed in some detail hereinabove. This places the decoded audio data in a condition 
for reproduction by an audio reproduction system, in the form of P decoded and downmixed 
channels as suitable for the particular reproduction system. 

20 . It will be immediately apparent to those skilled in the art that the principles of the present 
invention can be practically implemented in several different ways, including in software 
controlling general purpose computational apparatus. The preferred implementation is of 
course in a dedicated audio decoding integrated circuit in which the principles of the invention 
are embodied in hard wired circuitry or in the form of firmware provided for controlling 

25 portions of the overall audio decoder. No doubt other forms of implementation will also be 
apparent to those in the art, and it is intended that such forms not be excluded from the 
present invention where the principles described herein are nevertheless employed. 

The performance measurement between this invention and previous audio decoding 
30 implementations shows that a negligible degradation is obtained. This performance 
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degradation should nevertheless be considered when a particular hardware/software platform 
is implemented. 

Figure 8 shows the frequency-domain downmixing prior to transformation. The M-input 
5 channels will be analysed to verify the number of channels with enabling or disabling block- 
switch capabilities. A decision is made if there is need to convert some of the channel to 
block or nonblock-switch forcing. The frequency-domain coefficients of all channels are 
forced to have the same format and the downmix coefficients are used to obtain P output 
channels. These coefficients of the P channels are then inverse transformed to the time- 
10 domain and the windowing and overlap-and-add technique applied to recover the PCM output 
audio samples. 

The foregoing detailed description of the invention has been presented by way of example 
only, and is not intended to be considered limiting to the invention as defined in the claims 
15 appended hereto. 
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CLAIMS: 

1. An audio decoder, comprising: 

a demultiplexer for receiving a data signal and demultiplexing the data signal into a 
5 plurality of M frequency-domain input data channels; 

means for downmixing said M frequency-domain input channels into P freqency- 
domain channels, where M > P and P > 0, M and P both integers; and 

means for selecting an inverse transformation length and performing an inverse 
transformation of the P frequency-domain channels according to the selected length, so as to 
10 produce P audio sample output channels. 

2. An audio decoder as claimed in claim 1, wherein the means for selecting and 
performing an inverse transformation is biased to the selection of a long transform length. 

15 3. An audio decoder as claimed in claim 1 or 2, further including means for determining 
a transformation length of each of said M frequency-domain input channels. 

4. An audio decoder as claimed in claim 3, wherein the inverse transform length is 
selected according to the transformation lengths of the M frequency-domain input channels. 

20 

5. An audio decoder as claimed in claim 4, wherein the transformation length of the M 
frequency-domain input channels comprises a long transform length or a short transform 
length. 

25 6. An audio decoder as claimed in claim 5, wherein if the number of input channels 
having a long transform length is less than or equal to the integer value of M/2, then the 
inverse transformation of the P frequency-domain channels is performed using a short selected 
inverse transformation length. 
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7. An audio decoder as claimed in claim 5, wherein if the number of input channels 
having a short transform length is less than the integer value of M/2, then the inverse 
transformation of the P frequency-domain channels is performed using a long selected inverse 
transformation length. 

5 

8. A method of audio data decoding, comprising: 

receiving a data signal and demultiplexing the data signal into a plurality of M 
frequency-domain input data channels; 

downmixing said M frequency-domain input channels into P frequency-domain 
1 0 channels, where M>PandP>0,MandP both integers; and 

selecting an inverse transformation length and performing an inverse transformation 
of the P frequency-domain channels according to the selected length, so as to produce P audio 
sample output channels. 

15 9. A method of audio data decoding as claimed in claim 8, further including a step of 
determining a transformation length of each of said M frequency-domain input channels. 

10. A method of audio data decoding as claimed in claim 8 or 9, wherein the selection of 
an inverse transformation length is biased to the selection of a long transform length. 

20 

11. A method of audio data decoding as claimed in claim 9, wherein the inverse transform 
length is selected according to the transformation lengths of the M frequency-domain input 
channels. 

25 12. A method of audio data decoding as claimed in claim 1 1 , wherein the transformation 
length of the M frequency-domain input channels comprises a long transform length or a short 
transform length. 



30 



13. A method of audio data decoding as claimed in claim 12, wherein if the number of 
input channels having a long transform length is less than or equal to the integer value of 
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M/2, then the inverse transformation of the P frequency-domain channels is performed using 
a short selected inverse transformation length. 



14. A method of audio data decoding as claimed in claim 12, wherein if the number of 
5 input channels having a short transform length is less than the integer value of M/2, then the 
inverse transformation of the P frequency-domain channels is performed using a long selected 
inverse transformation length. 
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